Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes

نویسندگان

  • Haridimos Kondylakis
  • Niv Dayan
  • Konstantinos Zoumpatianos
  • Themis Palpanas
چکیده

Many modern applications produce massive amounts of data series that need to be analyzed, requiring e cient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing cannot be sorted while keeping similar data series close to each other in the sorted order. This leads to two design problems. First, traditional bulk-loading algorithms based on sorting cannot be used. Instead, index construction takes place through slow top-down insertions, which create a non-contiguous index that results in many random I/Os. Second, data series cannot be sorted and split across nodes evenly based on their median value; thus, most leaf nodes are in practice nearly empty. This further slows down query speed and amplifies storage costs. To address these problems, we present Coconut. The first innovation in Coconut is an inverted, sortable data series summarization that organizes data series based on a z-order curve, keeping similar series close to each other in the sorted order. As a result, Coconut is able to use bulk-loading techniques that rely on sorting to quickly build a contiguous index using large sequential disk I/Os. We then explore prefix-based and median-based splitting policies for bottom-up bulk-loading, showing that median-based splitting outperforms the state of the art, ensuring that all nodes are densely populated. Overall, we show analytically and empirically that Coconut dominates the state-of-the-art data series indexes in terms of construction speed, query speed, and storage costs. PVLDB Reference Format: Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos and Themis Palpanas. Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes. PVLDB, 11 (6): 677-6 0, 2018. DOI: https://doi.org/10.14778/3184470.3184472 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present their results at The 44th International Conference on Very Large Data Bases, August 2018, Rio de Janeiro, Brazil. Proceedings of the VLDB Endowment, Vol. 11, No. 6 Copyright 2018 VLDB Endowment 2150-8097/18/02. DOI: https://doi.org/10.14778/3184470.3184472

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tandem Scalable Microwave-Assisted Williamson Alkyl Aryl Ether Synthesis under Mild Conditions

An efficient tandem synthesis of alkyl aryl ethers, including valuable building blocks of dialdehyde and dinitro groups under microwave irradiation and solvent free conditions on potassium carbonate as a mild solid base has been developed. A series of alkyl aryl ethers were obtained from alcohols in excellent yields by following the Williamson ether synthesis protocol under practical mild condi...

متن کامل

1 Method of Bottom - Up Directed Assembly of Cell - Laden Microgels

9 Abstract—The paper describes a protocol to fabricate cell10 laden microgel assemblies with pre-defined micro-architec11 ture and complexity by a bottom-up approach, which can be 12 used for tissue engineering applications. The assembly 13 process was driven by hydrophobic effect in the water/oil 14 interface. By agitating hydrophilic microgels in hydrophobic 15 medium, the shape-controlled mi...

متن کامل

A Bottom-Up Strategy for Enterprise Ontology Implementation

The benefits of semantics for intelligent and interoperable services have been widely accepted in the computing community. Semantics can also improve the quality of software systems that deal with the every day operations of an enterprise. However, building and utilizing an ontology in the enterprise environment is very difficult and risky since ontology inference is complex, slow, and often un...

متن کامل

Elimination of Waste and Inefficient Facilities in Existing Buildings for Sustainability in Developing Nations

A major reason why many developing nations have not made significant advancement in sustainable development (SD) agenda is the neglect of existing building stock which forms the bulk of built assets. Although sustainable development is a universal challenge, it cannot be approached in the same way for all nations, but rather practical response can be defined nationally or locally. This paper re...

متن کامل

A Hybrid Approach to Extraction and Refinement of Building Footprints from Airborne Lidar Data

This work presents a combined bottom-up and top-down approach to extraction and refinement of building footprints from airborne LIDAR data. Building footprints are interesting for many applications in urban planning. The cadastral maps, however, may be limited for certain areas or not be updated frequently. Airborne laser scanning data is therefore considered by many people in the last decade a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2018